Skip to content

write results.json for structured agent consumption, harden crash diagnostics#331

Open
shichangs wants to merge 2 commits intokarpathy:masterfrom
shichangs:fix/structured-results-output
Open

write results.json for structured agent consumption, harden crash diagnostics#331
shichangs wants to merge 2 commits intokarpathy:masterfrom
shichangs:fix/structured-results-output

Conversation

@shichangs
Copy link

Summary

Fixes #64

  • train.py: writes results.json after evaluation with the same metrics already printed to stdout. Gives agents a structured, parseable results channel instead of relying on grepping free-form stdout from run.log.
  • program.md: instructs agent to read results.json first (fallback to grep). Crash diagnostics now use filtered grep -i "error|exception|traceback" instead of raw tail -n 50, reducing the surface for indirect prompt injection via training output.
  • .gitignore: adds results.json and run.log as runtime artifacts.

Existing stdout output is unchanged. No new dependencies (json is stdlib).

Similar to #79 but rebased on current main with no merge conflicts.

Test plan

  • Run uv run train.py and verify results.json is written with correct metrics
  • Verify stdout output is unchanged
  • Verify results.json values match stdout summary
  • Simulate a crash (e.g. syntax error in train.py) and confirm results.json is NOT written

🤖 Generated with Claude Code

…gnostics

train.py now writes a results.json file after evaluation with the same
metrics already printed to stdout. This gives agents a structured, parseable
results channel instead of relying on grepping free-form stdout.

program.md updated to read results.json first (fallback to grep), and to
use filtered grep for crash diagnostics instead of raw tail, reducing the
surface for indirect prompt injection via training output.

Fixes karpathy#64

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shichangs
Copy link
Author

cc @johnwaldo (issue author) @mvanhorn (PR #79 author) — would appreciate your review. This takes a similar approach to #79 but rebased cleanly on current main.

Offline verification

The results.json output can be validated without a GPU:

import json

# Simulate what train.py writes
results = {
    "val_bpb": 0.9979,
    "training_seconds": 300.1,
    "total_seconds": 325.9,
    "peak_vram_mb": 45060.2,
    "mfu_percent": 39.8,
    "total_tokens_M": 499.6,
    "num_steps": 953,
    "num_params_M": 50.3,
    "depth": 8,
}
with open("results.json", "w") as f:
    json.dump(results, f, indent=2)

# Verify roundtrip
with open("results.json") as f:
    loaded = json.load(f)
assert loaded == results
assert isinstance(loaded["val_bpb"], float)
assert isinstance(loaded["num_steps"], int)
print("✓ results.json roundtrip OK")

The key behavioral changes:

  1. Success path: agent reads results.json (structured JSON) instead of grepping stdout → no injection surface
  2. Crash path: agent uses grep -i "error|exception|traceback" to filter relevant lines before falling back to tail → reduced injection surface

Copy link

@johnwaldo johnwaldo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, minimal PR — correct mitigation layer for #64. Structured JSON eliminates the agent's need to parse free-form stdout, and the crash-path grep filter is a meaningful improvement over raw tail. A few observations:

Stale results.json on killed runs

If train.py is killed mid-run (the 10-minute timeout described in program.md), a results.json from a previous successful run would still exist on disk. The agent would read valid JSON with stale metrics and treat it as current — a silent wrong-answer bug.

Suggest deleting results.json at the top of train.py before training starts:

# Clear previous results so a killed run leaves no file
if os.path.exists("results.json"):
    os.remove("results.json")

This preserves the "no file = crash" contract that program.md step 5/6 relies on.

Crash-path grep is narrower but still an injection surface

The grep -i "error|exception|traceback" filter is better than raw tail -n 50, but an attacker can still craft output containing those words to get text into the agent's context. Worth noting this is mitigation, not elimination — a structured error.json on the crash path would close it fully, but that's arguably out of scope for this PR.

PR #79 cleanup

Since this supersedes #79 (same approach, rebased on current main), it would be good to close #79 with a pointer here to avoid confusion.

If train.py is killed mid-run (e.g. timeout), a results.json from a
previous successful run would still exist on disk, causing the agent
to read valid JSON with stale metrics. Delete it early so that
"no file = crash" contract holds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shichangs
Copy link
Author

Thanks for the thorough review @johnwaldo!

  1. Stale results.json — Great catch. Fixed in 4318f43: results.json is now deleted at the top of train.py before training starts, preserving the "no file = crash" contract.

  2. Crash-path grep injection surface — Agreed this is mitigation, not elimination. A structured error.json on the crash path would be the complete fix, but that feels like a separate PR. Leaving as-is for now.

  3. PR write eval results to results.json for structured agent consumption #79 cleanup — Will close write eval results to results.json for structured agent consumption #79 with a pointer here.

@mvanhorn Closing #79 in favor of this one — same approach, rebased on current main. Thanks for the original work there!

@mvanhorn
Copy link

Nice rebase. The crash diagnostics filter is a good addition over my original #79. Closing mine in favor of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Indirect prompt injection via training output fed back to agent

3 participants